Systems and methods for performing sound source localization

ABSTRACT

Systems and methods for performing sound source localization are provided. In one aspect, a method for locating a sound source using a computing device subdivides a space into subregions. The method then computes a sound source power for each of subregions and determines which of the sound source energies is the largest. When the volume of the subregion is less than a threshold volume, the method outputs the subregion having the largest sound source power. Otherwise, the stages of partitioning, computing, and determining the subregion having the largest sound source power is repeated.

TECHNICAL FIELD

This disclosure relates to signal processing of audio signals receivedat microphone arrays.

BACKGROUND

In recent years, theoretical and experimental work in the use ofmicrophone arrays have received increased interest. For example,microphone arrays can be used in audio processing tasks such as speechenhancement, denoising, interference rejection, de-reverberation, andacoustic echo cancellation. While audio processing performance may beimproved by increasing the number of microphones, practical uses oflarge microphone arrays are often limited. For example, technologicaladvancements enable the production of integrated microphone arrays withlow cost microphones. However, in general, the overall cost ofprocessing the large amounts of audio data captured by microphone arrayscan be prohibitively high due to the time expended in transmitting thelarge number of audio signals to a central processing location and inthe cost of computational resources used to process the audio signals atthe central location. Developers and users of microphone arrays continueto seek systems and methods that improve processing audio data generatedby microphone arrays.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example system for determining an approximate coordinatelocation of a sound source located within a search space.

FIG. 2 shows an example virtual, rectangular region, subregion withinwhich a sound source is located.

FIG. 3 shows a control-flow diagram of an example method for determiningan approximate coordinate location of a sound source located within asearch space.

FIGS. 4-5 show an example of a branch-and-bound search for a subregionof a search space containing a sound source.

FIG. 6 shows a graph summarizing the iterative process of thebranch-and-bound method for determining subregions of a search space.

FIG. 7 shows a graph summarizing a general iterative process of anexample branch-and-bound method for determining a subregion containing asound source.

FIG. 8 shows a portion of a search space with an array of microphoneselectronically connected to a computing device.

FIG. 9 shows a schematic representation of a computing device thatdetermines a subregion containing a sound source.

DETAILED DESCRIPTION

This disclosure is directed to systems and methods for performing soundsource localization. The methods employ a recursive subdivision of athree-dimensional search space in which the sound source is located. Anapproximate coordinate location of the sound source is determined usingvolumetric integration to compute upper bounds on the sound source powervalues of subregions of the search space.

FIG. 1 shows an example system 100 for determining the approximatecoordinate location of a sound source 102 located within a search space104. The system 100 includes an array of microphones 106 disposed withina wall 108 of the search space 104. The system 100 also includes acomputing device 110 electronically connected to each microphone 106.Systems are not limited to regularly spaced microphones 106. Inpractice, the microphones 106 can be irregularly spaced within one ofthe walls 108, or hanging from wires in the ceiling, and the number ofmicrophones can vary. The microphones 106 can also be located outsidethe search space 104. The sound source 102 can be a person speaking orother sound generating device or entity. The search space 104 can be aroom, an auditorium, or any space surrounded by walls, a floor, andceiling. For the sake of convenience, the search space 104 isrepresented as a rectangular prism-shaped room with rectangular-shapedwalls and rectangular-shaped floor and ceiling. However, methods andsystems are not intended to be so limited with respect to the geometriesof the search spaces to which they can be applied. In practice, thesearch space can have any regular or irregular shape.

Sounds generated by the sound source 102 are collected by eachmicrophone 106 and converted into electronic signals that are sent tothe computing device 110, which processes the electronic signals todetermine an approximate coordinate location of the sound source 102.FIG. 1 and subsequent figures include Cartesian coordinate system 112with the origin located in a corner of the search space 104. Inpractice, the origin of the coordinate system used to assign theapproximate coordinate location of a sound source can be located at anysuitable point within or outside the search space, and depending on theconfiguration of the search space, the coordinate system can be acylindrical coordinate system, spherical coordinate system, or any othersuitable coordinate system.

FIG. 2 shows an example virtual, rectangular prism-shaped, subregion 202within which the sound source 102 is located. A vector 204 emanatingfrom the origin 206 identifies approximate Cartesian coordinates x=(x,y, z) of the sound source 102. In the example of FIG. 2, the coordinatesx can be functions 208 of the dimensions of the search space 104 denotedby w, l, and h, which represent the width, length, and height,respectively, of the search space 104. For example, the coordinates xcan be the coordinates of any one of the eight corners of the subregion202, or the coordinates x can be the center point of the subregion 202.The subregion 202 is determined by iteratively subdividing the searchspace into smaller subregions and determining the subregion with thelargest associated sound source power, denoted by W(S), where S is asix-dimensional coordinate denoting a subregion. For example, S caninclude Cartesian coordinates x, y, and z that identify a coordinatelocation of a point, such as a corner of the subregion S, and includethree additional values defining the length of the subregion sides. Anexample of equations used to compute a measure of the sound source powerW(S) of a subregion S is described in greater detail below withreference to FIG. 8. This power measure is not the physical measure ofpower, but corresponds to the values obtained from calculations usingthe microphone signals. The iterative process of determining thesubregion containing the sound source 102 is a branch-and-bound searchthat implicitly enumerates and tests all subregions and stops when thevolume of the subregion with the largest associated sound source poweris less than a predefined threshold volume V_(th).

FIG. 3 shows a control-flow diagram of an example method 300 fordetermining an approximate coordinate location of a sound source locatedwithin a search space. Stages of the method 300 are described withreference example subdivisions of the search space 104 shown in FIGS. 4and 5 in order to appreciate the operations performed in each stage ofthe method 300. In the following description, S_(full) denotes the fullsearch space 104, and L represents a list of pairs [S, W(S)] associatedthe subregions S to be evaluated. In stage 301, the list L is initiallypopulated with a single element [S_(full), W(S_(full))], whereW(S_(full)) is the computed sound source power for the un-subdividedsearch place 104, W _(max) is a parameter initially assigned the value“0,” and the parameter S_(max) is left undefined. The sound sourceenergy of the search space W(S_(full)) can be computed using Equation(1) described below. In stage 302, when the list L is empty, the methodstops. Otherwise, the method proceeds to stage 303. In stage 303, anelement [(S_(c), W(S_(c))] in the list L is selected followed bydeleting the element [S_(c), W(S_(c))] form the list L in stage 304,where c is a positive integer index. Initially, when the list L includesonly the element [S_(full), W(S_(full))], the list L is empty aftercompleting stage 304. In stage 305, when W(S_(c)) is less than or equalto W _(max), the method proceeds to stage 302. Otherwise, the methodproceeds to stage 306. In stage 306, when the volume of S_(c), denotedby vol(S_(c)), is less than or equal to V_(th), the method proceeds tostage 308. Otherwise, the method proceeds to stage 308. In stage 307, W_(max) is assigned the value of W(S_(c)) and S_(max) is assigned thevalue S_(c) (containing current subregion size and position) and themethod returns to stage 302. Initially, W(S_(full)) is greater than W_(max) and vol(S_(full)) is greater than V_(th), and the method proceedsfrom stages 305 and 306 to stage 308, where the subregion S_(c) issubdivided into D disjoint subregions S_(c,i) according to:

${\overset{D}{\bigcup\limits_{i = 1}}S_{c,i}} = S_{c}$

such that S_(c,i)≠S_(c,j) for i≠j. In other words, subdividing thesubregion S_(c) creates a partition of the subregion S_(c).

FIG. 4 shows an example of the search space. S_(full) 104 initiallysubdivided into 8 virtual, rectangular prism-shaped, subregions denotedby S_(i), where i is an integer index ranging from 1 to 8. In theexample of FIG. 4, the subregions S_(c) have the same rectangular prismshape and dimensions.

Returning to FIG. 3, in stage 309, the sound source power W(S_(c,i)) iscomputed for each of the subregions. For example, returning to FIG. 4,the sound source power W(S_(i)) is computed for each of the eightsubregions. In stage 310, the pairs [S_(c,i), W(S_(c,i))] are composedby pairing the subregion S_(c,i) with the associated sound source powerW(S_(c,i)) and the pairs are combined to repopulate the list L. Themethod 300 then proceeds back to stage 302.

Now with reference to the example subdivision shown in FIG. 4 and themethod 300 shown in FIG. 3, stages 302-305 of the method 300 arerepeated for the elements of the list L, until in stage 305 a subregionwith an associated sound source power greater than W _(max) is found. InFIG. 4, the subregion S₆ 402 is shaded and represents a subregion withW(S₆) is greater than W _(max). Method 300 then proceeds to stage 306where the vol(S₆) of the subregion S₆ is compared to the thresholdvolume V_(th). In the example of FIG. 4, the vol(S₆) is assumed to begreater than V_(th), so method 300 proceeds to stage 308, where thesubregion S₆ is subdivided into 8 subregions S_(6,j), where j is aninteger index ranging from 1 to 8, as shown in FIG. 5. In stage 309, thesound source power is computed for each of the 8 subregions S_(6,j) andthe list L is populated in stage 310. Stages 302-305 are repeated until,in stage 305, a subregion with an associated sound source power greaterthan W _(max) is found, which is identified in FIG. 5 as subregionS_(6,4) 502. Method 300 then proceeds to stage 306 where thevol(S_(6,4)) of the subregion S_(6,4) is compared to the thresholdvolume V_(th). In this example, one more round of performing stages308-310 and repeating stages 302-305 is carried out to arrive at thesubregion 202 shown in FIG. 2. The method 300 stops when volume of thesubregion is greater than the threshold volume V_(th), the associatedsound source power is less than W _(max) and the list L is empty.

Most of the computational complexity of this algorithm is in thecalculation of W(V) in stage 309. There are several different strategiesfor choosing the subregion to be removed from the list L in stage 303.For example, in stage 303, the element [S_(c), W(S_(c))] with largestpower measure W(S_(c)) can be selected or the element [S_(c), W(S_(c))]with the largest power divided by its subregion volume (i.e.,W(S_(c))/vol(S_(c))) can be selected. Method 300 finds the approximatecoordinate location of the sound source with any strategy used forselecting elements of L, but the method 300 can be performed in anefficient manner with a careful selection strategy performed in stage303.

FIG. 6 shows a graph 600 summarizing the iterative process of thebranch-and-bound method used to determine the subregion S_(6,4,1). InFIG. 6, node 601 represents the search space 104. The nodes labeled S₁,. . . , S₈ each represent the subregions determined in the firstiteration 602 of partitioning the search space 104, as described abovewith reference to FIG. 4. The subregion S₆ has the largest associatedsound source power W(S₆), and, in the second iteration 604, thesubregion S₆ is subdivided into eight more subregions represented bynodes labeled S_(6,1), . . . , S_(6,8), as described above withreference to FIG. 5. The subregion S_(6,4) has the largest associatedsound source power W(S_(6,4)), and in the third iteration 606, thesubregion S_(6,4) is subdivided into eight subregions represented bynodes labeled S_(6,4,1), . . . , S_(6,4,8). Dashed-line enclosure 608represents the path in the graph 600 from the search space 104 to thesubregion S_(6,4,1). In the example represented in FIG. 6, threepartition iterations 602, 604, and 606 are used to reach the subregionS_(6,4,1) with a volume less than the volume of a threshold V_(th) andin which the sound source 102 is located in order to assign anapproximate coordinate location to the sound source 102.

It is assumed in the example of FIGS. 2-6 that the subregions areinserted into the list L. Stages 303-306 correspond to sequentiallytesting the subregions. When the W(•) value of a given subregion issmaller than W _(max)= W(S_(max)) that subregion is discarded, becauseit cannot contain a better solution (subregion with larger power).Otherwise, that subregion is further subdivided, using the same processdescribed above.

Branch-and-bound methods are not limited to partitioning a space intosubregions where all of the subregions have the same volume and shape.The subregions can have any suitable three-dimensional geometrical shapeand the shapes can vary from iteration to iteration provided the set ofsubregions obtained in any one iteration partition the space. Also, andthe number of subregions partitioning a space can be different at eachstage of the branch-and-bound method.

FIG. 7 shows a graph 700 summarizing a general iterative process of abranch-and-bound method for determining a subregion containing a soundsource. In the first iteration 702, the search space represented by node701 is subdivided into four subregions where the subregion representedby node S₂ is determined to have the largest associated sound sourcepower W(S₂). In the second iteration 704, the subregion S₂ is subdividedinto five subregions where the subregion represented by node S_(2,2) isdetermined to have the largest associated sound source, powerW(S_(2,2)). In the third iteration 706, the subregion S_(2,2) issubdivided into three subregions where the subregion represented by nodeS_(2,2,3) is determined to have the largest associated sound sourcepower W(S_(2,2,3)). Finally, in the fourth iteration 708, the subregionS_(2,2,3) is subdivided into four subregions where the subregionrepresented by node S_(2,2,3,2) is determined to have the largestassociated sound source power W(S_(2,2,3,2)) and, the volume of thesubregion S_(2,2,3,2) is less than a threshold volume V_(th).

Examples of techniques for increasing the efficiency of branch-and-boundmethods can be found in “Branch and Bound Algorithms—Principles andExamples,” by J. Clausen,http://www.diku.dk/OLD/undervisning/2003e/datV-optimer/JensClausenNoter.pdf;“Parallel branch-and-bound algorithms: survey and synthesis,” B. Gendonand T. G. Cranic, Operations Research; vol. 42(6), pp. 1042-1066, 1994;and “Enumerative approaches to combinatorial optimization,” T. Ibaraki,Annals of Operations Research, vol. 10, 1987.

Note that even though the sound source power W(S) is computed forthree-dimensional subregions, certain parameters of W(S) can bepre-computed to reduce real time computation of the sound source powerfrom evaluating three-dimensional integrals to less computationallydemanding evaluation of one-dimensional integrals. Derivation of anexample mathematical expression for the sound source power W(S) based onone-dimensional integral evaluations is now described.

The sound source power W(S) of a subregion S can be determined bycomputing the following expression:

$\begin{matrix}{{\overset{\_}{W}(S)} = {\sum\limits_{p = 1}^{P}\; \left\{ {\left\lbrack {{\chi_{p}\left( {\zeta,S} \right)}{\Phi_{p}(\zeta)}} \right\rbrack_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}} - {\sum\limits_{k = 1}^{K - 1}\; {{\chi_{p}^{\prime}\left( {\zeta_{k},S} \right)}\left\lbrack {\Phi (\zeta)} \right\rbrack}_{\zeta_{k}}^{\zeta_{k + 1}}}} \right\}}} & {{Equation}\mspace{14mu} (1)}\end{matrix}$

where χ_(p)(ζ,S)=∫ _(xεS)δ(ζ−ξ_(p)( x))d x is the time delay over thevolume of the subregion S, δ(•) is Dirac's delta function,

Φ_(p)(ζ)=∫_(−∞) ^(ζ)φ_(p)(z)dz, and

${\chi_{p}^{\prime}\left( {\zeta,S} \right)} = \left. \frac{{\chi_{p}\left( {z,S} \right)}}{z} \right|_{z = \zeta}$

with z and ζ representing time.Expressions for the time delay χ_(p)(ζ,S), time derivative of the timedelay χ′_(p)(ζ,S), Φ_(p)(ζ), and the parameters P, p, K, ζ_(p,S) ^(max),and ζ_(p,S) ^(min) of Equation (1) are now described with reference toFIG. 8.

FIG. 8 shows a portion of a search space 802 with an array of Mmicrophones 804 disposed within a wall 806 of the search space 802 andare electronically connected to a computing device 808. A set ofcontinuous time, audio signals corresponding to the M differentmicrophones represented by

{s ₁(t),s₂(t), . . . ,s_(M)(t)}

are sent to the computing device 808. The signals s_(i)(t) canpre-filtered to improve performance. The coordinate location of a soundsource 810 is denoted by x 812. Using τ_(i)( x) to represent the timedelay at microphone i for sounds emanating from position x 812, soundsource localization is based on maximizing the sound source power:

$\begin{matrix}\begin{matrix}{{W_{f}\left( \overset{\_}{x} \right)} = {\int_{- \infty}^{\infty}{\left\lbrack {\sum\limits_{i = 1}^{M}{s_{i}\left( {t - {\tau_{i}\left( \overset{\_}{x} \right)}} \right)}} \right\rbrack^{2}{t}}}} \\{= {\sum\limits_{i = 1}^{M}{\sum\limits_{j = 1}^{M}{\int_{- \infty}^{\infty}{{s_{i}\left( {t - {\tau_{i}\left( \overset{\_}{x} \right)}} \right)}{s_{j}\left( {t - {\tau_{j}\left( \overset{\_}{x} \right)}} \right)}{t}}}}}}\end{matrix} & {{Equation}\mspace{14mu} (2)}\end{matrix}$

Defining

φ_(i,j)(τ)=∫_(−∞) ^(∞) s _(i)(t+τ)s _(j)(t)dt

and the time delay difference between any two microphones i and j as

ξ_(i,j)( x )=τ_(i)( x )−τ_(j)( x )

enables the sound source power at a point x in the search space to bere-written as:

$\begin{matrix}{{W_{f}\left( \overset{\_}{x} \right)} = {\sum\limits_{i = 1}^{M}{\sum\limits_{j = 1}^{M}{\varphi_{i,j}\left( {\xi_{i,j}\left( \overset{\_}{x} \right)} \right)}}}} & {{Equation}\mspace{14mu} (3)}\end{matrix}$

Because φ_(i,j)(τ)=φ_(j,i)(−τ), ξ_(i,j)( x)=−ξ_(j,i)( x) and ξ_(i,j)(x)=0, the sum over each of the P=M (M−1)/2 possible pairs of distinctmicrophones can now be considered. By defining p=i(i−3)/2+j+1, thesubscripts i and j in Equation (3) can be replaced to give the soundsource power at the point x:

$\begin{matrix}{{W\left( \overset{\_}{x} \right)} = {{\sum\limits_{i = 2}^{M}{\sum\limits_{j = 1}^{i - 1}{\varphi_{i,j}\left( {\xi_{i,j}\left( \overset{\_}{x} \right)} \right)}}} = {\sum\limits_{p = 1}^{P}{\varphi_{p}\left( {\xi_{p}\left( \overset{\_}{x} \right)} \right)}}}} & {{Equation}\mspace{14mu} (4)}\end{matrix}$

In order to compute W( x) for any point x in the search space, W( x_(n)) is computed for each point of a set of N spatial points { x ₁, x₂, . . . , x _(N)} of the search space and the sound energies of the Npoints are stored in a look-up table, where φ_(p)(τ) and the time delaydifference ξ_(p)( x) are pre-computed for each x _(n). The sound sourcepower W( x) can then be computed using the look-up table andinterpolation (see e.g., J. DiBiase, “A high-accuracy, low-latencytechnique for talker localization in reverberant environments,” Ph.D.dissertation, Brown University, Providence, R.I., May 2000).

On the other hand, methods are directed to determining the approximatecoordinate location of the sound source 808 by computing the soundsource power in three-dimensional subregions of the search space bypartitioning the search space where acoustic activity is detected, asdescribed above with reference to the examples of FIGS. 3-7. Nowconsider the sound source power of a subregion S given by:

$\begin{matrix}{{\overset{\_}{W}(S)} = {\int_{\overset{\_}{x} \in S}{{W\left( \overset{\_}{x} \right)}{\overset{\_}{x}}}}} \\{= {\sum\limits_{p = 1}^{P}{\int_{\overset{\_}{x} \in S}{{\varphi_{p}\left( {\xi_{p}\left( \overset{\_}{x} \right)} \right)}{\overset{\_}{x}}}}}}\end{matrix}$

The integrand φ_(p) (ξ_(p)( x)) can be expanded as follows

$\begin{matrix}{{\varphi_{p}\left( {\xi_{p}\left( \overset{\_}{x} \right)} \right)} = {\int_{- \infty}^{\infty}{{\varphi_{p}(\zeta)}{\delta \left( {\zeta - {\xi_{p}\left( \overset{\_}{x} \right)}} \right)}{\zeta}}}} \\{= {\int_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}}{{\varphi_{p}(\zeta)}{\delta \left( {\zeta - {\xi_{p}\left( \overset{\_}{x} \right)}} \right)}{\zeta}}}}\end{matrix}$

where δ(ζ) is the Dirac delta function enabling the sound source powerof a subregion S to be characterized as:

${\overset{\_}{W}(S)} = {\sum\limits_{p = 1}^{P}{\int_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}}{{{\varphi_{p}(\zeta)}\left\lbrack {\int_{\overset{\_}{x} \in S}{{\delta \left( {\zeta - {\xi_{p}\left( \overset{\_}{x} \right)}} \right)}{\overset{\_}{x}}}} \right\rbrack}{\zeta}}}}$

where the time delay is given by

χ_(p)(ζ,S)=∫ _(xεS)δ(ζ−ξ_(p)( x ))d x   Equation (5):

The sound source power of a subregion S can also be re-written as sum ofone-dimensional integrals:

$\begin{matrix}{{\overset{\_}{W}(S)} = {\sum\limits_{p = 1}^{P}{\int_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}}{{\chi_{p}\left( {\zeta,V} \right)}{\varphi_{p}(\zeta)}{\zeta}}}}} & {{Equation}\mspace{14mu} (6)}\end{matrix}$

Note that in Equation (6), calculation of the sound source power can beaccomplished by evaluating a one-dimensional integral in time, but theintegrand of each each term includes the time delay χ_(P)(ζ,S), which,according to Equation (5), is a three-dimensional integral that dependson microphone positions and the geometry of the subregion. However, thetime delay χ_(P)(ζ,S) does not depend on the sound emanating from thesubregion S. As a result, rather than computing the time delayχ_(p)(ζ,S) for each term of Equation (6) each time the sound sourcepower W(S) is computed, the time delay χ_(p)(ζ,S) can be computed foreach of the subregions prior to initiating sound source localization.For example, after the microphones have been mounted at fixed locationsand the subregions that subdivide the search spaces is known in advance,the time delay χ_(p)(ζ,S) for each of the subregions can be computed andstored in a look-up table. In other words, because the time delayχ_(p)(ζ,S) depends only on integrating over subregion S and the spatialarrangement of microphones, χ_(p)(ζ,S) can be pre-computed before soundsource localization begins and stored in a look-up table. By computingthe time delay in advance for each subregion and storing the time delayfor each subregion in a look-up table, the complexity of computing thesound source power W(S) of a subregion according to Equation (6) issignificantly reduced from having to evaluate a three-dimensionalintegral in space (i.e., χ_(p)(ζ,S)) and a one-dimensional integral intime to simply evaluating the one-dimensional integral in time. As aresult, the computational demand of computing the sound source powerW(S) for a large number of points using Equation (4) is significantlyreduced, because most of the complexity and accuracy problems related tovolumetric integrals are eliminated by pre-computing χ_(p)(ζ,S), andbecause the one-dimensional integrals in Equation (6) can be computedmore efficiently if χ_(p)(ζ,S) can be approximated as a piecewise linearfunction. In addition, as long as the geometry of the search space doesnot change and the positions of the microphones remain fixed, the timedelays χ_(p)(ζ,S) stored in the look-up table can be used repeatedly inperforming sound source localization.

The one-dimensional integrals in Equation (6) can be eliminated by firstexpanding the integrals using integration by parts to obtain:

$\begin{matrix}{{\int_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}}{{\chi_{p}\left( {\zeta,S} \right)}{\varphi_{p}(\zeta)}\ {\zeta}}} = {\left\lbrack {{\chi_{p}\left( {\zeta,S} \right)}{\Phi_{p}(\zeta)}} \right\rbrack_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}} - {\int_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}}{{\Phi (\zeta)}{\chi_{p}^{\prime}\left( {\zeta,S} \right)}{\zeta}}}}} & {{Equation}\mspace{14mu} (7)}\end{matrix}$

where

Φ_(p)(ζ)=∫_(−∞) ^(ζ)φ_(p)(z)dz

can be computed after φ_(p) (z) is computed, and

${\chi_{p}^{\prime}\left( {\zeta,S} \right)} = \left. \frac{{\chi_{p}\left( {z,S} \right)}}{z} \right|_{z = \zeta}$

which depends only on the subregion S and microphone geometry. Second,if χ_(p)(ζ,S) can be defined by a set of K linear pieces at {ζ₁=ζ_(p,V)^(min), ζ₂, . . . , ζ_(K)=ζ_(p,V) ^(max)}, then χ′_(p)(ζ,S) also assumesK constant values, and the integrals of Equation (7) can be written asfollows:

$\begin{matrix}{{\int_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}}{{\chi_{p}\left( {\zeta,S} \right)}{\varphi_{p}(\zeta)}{\zeta}}} = {\left\lbrack {{\chi_{p}\left( {\zeta,S} \right)}{\Phi_{p}(\zeta)}} \right\rbrack_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}} - {\sum\limits_{k = 1}^{K - 1}{{\chi_{P}^{\prime}\left( {\zeta_{k},S} \right)}\left\lbrack {\Phi (\zeta)} \right\rbrack}_{\zeta_{k}}^{\zeta_{k + 1}}}}} & {{Equation}\mspace{14mu} (8)}\end{matrix}$

Substituting Equation (8) into Equation (6) gives Equation (1), whichcan be used to compute the sound source power of a subregion S.

Note that the expression of the sound source power W(S) provided inEquation (1) results from having applied integration by parts to each ofthe integrals in Equation (6). Embodiments are not limited to theexpression of the sound source power W(S) expressed in Equation (1).Other integration techniques can be used to evaluate the one-dimensionalintegrals in Equation (6) resulting in a variety of differentexpressions for the sound source energy W(S). The expression for thesound source energy W(S) given by Equation (1) represents one of manysuch expressions.

With the volumetric computation of W(S), branch-and-bound search methodsdeveloped for discrete and combinatorial optimization can be used tofind the subregion with a volume that is less than the threshold volumeV_(th). In other words, branch-and-bound search methods can be used tofind the maximum W(S) for a set of subregions of a search space, becausethe sound source power W(S) is positive valued. Thus, the total integralvalue for a subregion S is always an upper bound on the integral of anysmaller subregion contained within S. For example, returning to theexamples of FIGS. 3A and 4A, the sound source power of the subregion S₆302 is greater than the sound source power of each of the subregionsS_(6,j), for j equal to 1 through 8.

Assuming point sound sources and that the subregions can be subdivideduntil reaching a pre-defined minimum subregion with volume V_(min), thesound source power of each subregion can continue to be calculated foreach subregion. The efficiency of branch-and-bound methods, such as thebranch-and-bound method described above with reference to FIGS. 3-6, ispredicated on if the sound source power of a subregion S is smaller thanthe sound source power of the known minimum subregion, then thesubregion S is not subdivided because the subregion S does contain theoptimal sound source power.

Methods for determining an approximate coordinate location of a soundsource described above are performed using a computing device. Thecomputing device can be a desktop computer, a laptop, or any othersuitable computational device that can be used to process the audiosignals generated by an array of microphones. FIG. 9 shows a schematicrepresentation of a computing device 900. The device 900 includes one ormore processors 902, such as a central processing unit; one or morenetwork interfaces 904, such as a Local Area Network LAN, a wireless802.11x LAN, a 3G mobile WAN or a WiMax WAN; a microphone arrayinterface 906, and one or more computer-readable mediums 908. Each ofthese components is operatively coupled to one or more buses 910. Forexample, the bus 912 can be an EISA, a PCI, a USB, a FireWire, a NuBus,or a PDS.

The computer readable medium 908 can be any suitable medium thatprovides-machine-readable instructions to the processor 902 forexecution. For example, the computer-readable medium 908 can benon-volatile media, an optical disk, a magnetic disk, or a magnetic diskdrive; volatile media, such as memory; and transmission media, such ascoaxial cables, copper wire, and fiber optics. The computer-readablemedium 908 can also store other kinds of machine-readable instructions,including word processors, browsers, email, Instant Messaging, mediaplayers, and telephony software.

The computer-readable medium 908 may also store an operating system 912,such as Mac OS, MS Windows, Unix, or Linux; network applications 914;and machine-readable instructions for performing sound sourcelocalization 916. The operating system 912 can be multi-user,multiprocessing, multitasking, multithreading, real-time and the like.The operating system 912 can also perform basic tasks such asrecognizing input from input devices, such as a keyboard, a keypad, or amouse; sending output to a projector and a camera; keeping track offiles and directories on medium 908; controlling peripheral devices,such as disk drives, printers, image capture device; and managingtraffic on the one or more buses 910. The network applications 914includes various components for establishing and maintaining networkconnections, such as machine-readable instructions for implementingcommunication protocols including TCP/IP, HTTP, Ethernet, USB, andFireWire.

The sound localization medium 916 provides various machine-readableinstruction components for determining a coordinate location of a soundsource, as described above. In certain examples, some or all of theprocesses performed by the sound localization medium 916 can beintegrated into the operating system 912. In certain examples, theprocesses can be at least partially implemented in digital electroniccircuitry, or in computer hardware, or in any combination thereof.

The foregoing description, for purposes of explanation, used specificnomenclature to provide a thorough understanding of the disclosure.However, it will be apparent to one skilled in the art that the specificdetails are not required in order to practice the systems and methodsdescribed herein. The foregoing descriptions of specific embodiments arepresented for purposes of illustration and description. They are notintended to be exhaustive of or to limit this disclosure to the preciseforms described. Obviously, many modifications and variations arepossible in view of the above teachings. The embodiments are shown anddescribed in order to best explain the principles of this disclosure andpractical applications, to thereby enable others skilled in the art tobest utilize this disclosure and various embodiments with variousmodifications as are suited to the particular use contemplated. It isintended that the scope of this disclosure be defined by the followingclaims and their equivalents:

1. A method for locating a sound source using a computing device, themethod comprising: subdividing a space into subregions; computing asound source power for each subregion; determining a largest of thesound source power of the sound source energies; and outputting thesubregion having the largest sound source power when the volume of thesubregion is less than a threshold volume, otherwise repeat subdividing,computing, and determining for the subregion having the largest soundsource power.
 2. The method of claim 1, wherein determining a largest ofthe sound source power of the sound source energies further comprises:populating a list of subregions and associated sound source powers; andselecting a subregion and associated sound source power from the listwith the largest sound source power.
 3. The method of claim 1, whereindetermining a largest of the sound source power of the sound sourceenergies further comprises: populating a list of subregions andassociated sound source powers; and selecting a subregion and associatedsound source power from the list with the largest power divided by thesubregion volume.
 4. The method of claim 1, computing the sound sourcepower further comprises computing:${\overset{\_}{W}(S)} = {\sum\limits_{p = 1}^{P}\left\{ {\left\lbrack {{\chi_{P}\left( {\zeta,S} \right)}{\Phi_{p}(\zeta)}} \right\rbrack_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}} - {\sum\limits_{k = 1}^{K - 1}{{\chi_{P}^{\prime}\left( {\zeta_{k},S} \right)}\left\lbrack {\Phi (\zeta)} \right\rbrack}_{\zeta_{k}}^{\zeta_{k + 1}}}} \right\}}$where χ_(p)(ζ,S)=∫ _(xεS)δ(ζ−ξ_(p)( x))d x is the time delay over thesubregion S,Φ_(p)(ζ)=∫_(−∞) ^(ζ)φ_(p)(z)dz,${{\chi_{p}^{\prime}\left( {\zeta,S} \right)} = \left. \frac{{\chi_{p}\left( {z,S} \right)}}{z} \right|_{z = \zeta}},$K is an index for the set {ζ₁=ζ_(p,S) ^(min), ζ₂, . . . , ζ_(K)=ζ_(p,S)^(max)} of times, P=M(M−1)/2 is the number of microphone pairs in themicrophone array, and p=i(i−3)/2+j+1 is an index representing the ithand jth microphone pair.
 5. The method of claim 1, further comprisingpre-computing a time delay for each subregions of the space prior tolocating a sound source.
 6. The method of claim 1, outputting thesubregion having the largest sound source power when the volume of thesubregion is less than a threshold volume further comprises comparingthe volume of the subregion having the largest sound source power to thethreshold volume.
 7. A computer-readable medium having instructionsencoded thereon for locating a sound source, the instructions enablingat least ones processor to perform the operations of: subdividing aspace into subregions; computing a sound source power for eachsubregion; determining a largest of the sound source power of the soundsource energies; and outputting the subregion having the largest soundsource power when the volume of the subregion is less than a thresholdvolume, otherwise repeat subdividing, computing, and determining for thesubregion having the largest sound source power.
 8. The medium of claim7, wherein determining a largest of the sound source power of the soundsource energies further comprises: populating a list of subregions andassociated sound source powers; and selecting a subregion and associatedsound source power from the list with the largest sound source power. 9.The medium of claim 7, wherein determining a largest of the sound sourcepower of the sound source energies further comprises: populating a listof subregions and associated sound source powers; and selecting asubregion and associated sound source power from the list with thelargest power divided by the subregion volume.
 10. The medium of claim7, computing the sound source power further comprises computing:${\overset{\_}{W}(S)} = {\sum\limits_{p = 1}^{P}\left\{ {\left\lbrack {{\chi_{P}\left( {\zeta,S} \right)}{\Phi_{p}(\zeta)}} \right\rbrack_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}} - {\sum\limits_{k = 1}^{K - 1}{{\chi_{P}^{\prime}\left( {\zeta_{k},S} \right)}\left\lbrack {\Phi (\zeta)} \right\rbrack}_{\zeta_{k}}^{\zeta_{k + 1}}}} \right\}}$where χ_(p)(ζ,S)=∫ _(xεS)δ(ζ−ξ_(p)( x))d x is the time delay over thesubregion S,Φ_(p)(ζ)=∫_(−∞) ^(ζ)φ_(p)(z)dz,${{\chi_{p}^{\prime}\left( {\zeta,S} \right)} = \left. \frac{{\chi_{p}\left( {z,S} \right)}}{z} \right|_{z = \zeta}},$K is an index for the set {ζ₁=ζ_(p,S) ^(min), ζ₂, . . . , ζ_(K)=ζ_(p,S)^(max)} of times, P=(M−1)/2 is the number of microphone pairs in themicrophone array, and p=i(i−3)/2+j+1 is an index representing the ithand jth microphone pair.
 11. The medium of claim 7, further comprisingpre-computing a time delay for each of the subregions of the space priorto execution of the instructions for a locating a sound source.
 12. Themedium of claim 7, outputting the subregion having the largest soundsource power when the volume of the subregion is less than a thresholdvolume further comprises comparing the volume of the subregion havingthe largest sound source power to the threshold volume.
 13. A soundsource localization system comprising; an array of microphones disposedto capture sounds of a space; a computing device electronicallyconnected to the microphones and including: at least one processor, andmemory having instructions stored therein for locating a sound source,the instructions enabling the at least one processor to perform theoperations of: subdividing the space into subregions; computing a soundsource power for each subregion; determining a largest of the soundsource power of the sound source energies; and outputting the subregionhaving the largest sound source power when the volume of the subregionis less than a threshold volume, otherwise repeat subdividing,computing, determining and comparing for the subregion.
 14. The systemof claim 13, wherein determining a largest of the sound source power ofthe sound source energies further comprises: populating a list ofsubregions and associated sound source powers; and selecting a subregionand associated sound source power from the list with the largest soundsource power.
 15. The system of claim 13, wherein determining a largestof the sound source power of the sound source energies furthercomprises: populating a list of subregions and associated sound sourcepowers; and selecting a subregion and associated sound source power fromthe list with the largest power divided by the subregion volume.
 16. Thesystem of claim 13, wherein the array of microphones are randomlydistributed.
 17. The system of claim 13, wherein the array ofmicrophones are regularly distributed.
 18. The system of claim 13,computing the sound source power further comprises computing:${\overset{\_}{W}(S)} = {\sum\limits_{p = 1}^{P}\left\{ {\left\lbrack {{\chi_{P}\left( {\zeta,S} \right)}{\Phi_{p}(\zeta)}} \right\rbrack_{\zeta_{p,S}^{\min}}^{\zeta_{p,S}^{\max}} - {\sum\limits_{k = 1}^{K - 1}{{\chi_{P}^{\prime}\left( {\zeta_{k},S} \right)}\left\lbrack {\Phi (\zeta)} \right\rbrack}_{\zeta_{k}}^{\zeta_{k + 1}}}} \right\}}$where χ_(p)(ζ,S)=∫ _(xεS)δ(ζ−ξ_(p)( x))d x is the time delay over thesubregion S,Φ_(p)(ζ)=∫_(−∞) ^(ζ)φ_(p)(z)dz,${{\chi_{p}^{\prime}\left( {\zeta,S} \right)} = \left. \frac{{\chi_{p}\left( {z,S} \right)}}{z} \right|_{z = \zeta}},$K is an index for the set {ζ₁=ζ_(p,S) ^(min), ζ₂, . . . , ζ_(K)=ζ_(p,S)^(max)} of times, P=M(M−1)/2 is the number of microphone pairs in themicrophone array, and p=i(i−3)/2+j+1 is an index representing the ithand jth microphone pair.
 19. The system of claim 13, further comprisingpre-computing a time delay for each of the subregions of the space,prior to execution of the instructions for a locating a sound source.20. The system of claim 13, outputting the subregion having the largestsound source power when the volume of the subregion is less than athreshold volume further comprises comparing the volume of the subregionhaving the largest sound source power to the threshold volume.